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designs  often  are  used  in  studies  of  organizational  development,  leadership 
training,  and  job  enrichment  where  ratings  on  some  variable  of  Interest 
are  taken  both  before  and  after  the  change.  Golembiewski,  Billingsley,  and 
Yeager  (1976),  however,  identified  three  conceptually  different  types  of 
change  that  can  occur  with  such  self-report  data,  and  they  referred  to  these 
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as  alpha,  beta,  and  gamma  change.  They  proposed  a  method  for  measuring 
gamma  change,  but  stated  a  need  for  methodologies  to  measure  alpha  and 
beta  change.  Subsequent  papers  by  Zmud  and  Armenakis  (1978)  and  Lindell  and 
Drexler  (1979)  focused  on  the  measurement  of  all  three  types  of  change. 

The  present  paper  critically  reviews  this  work  on  alpha,  beta,  and  gamma 
change.  Methodological  problems,  the  need  for  large  sample  sizes,  and  the 
use  of  only  group  level  data  are  cited  as  disadvantages  with  the  previously 
proposed  methods.  Drawing  on  the  current  work  of  Howard  et  al.  (1979)  on 
response  shifts  with  self-report  data,  a  method  using  profile  analysis  is 
proposed  that  is  capable  of  indexing  all  three  types  of  change  at  both 
group  and  individual  levels  of  analysis.  This  method  uses  Pre  ratings  and 
Post  ratings,  but  also  uses  a  retrospective  "Then”  rating.  The  retrospective 
rating  asks  the  participant  to  think  back  to  how  the  situation  was  before 
the  change  and  to  re-rate  that  situation  in  terms  of  the  current  situation. 

It  is  documented  that  meaningful  knowledge  on  the  effects  of  change  can  at 
times  be  understood  best  through  comparison  of  the  difference  between  the 
Post  and  the  retrospective  Then  ratings  rather  than  through  comparison  of 
the  difference  between  Pre  and  Post  ratings.  Advantages  and  limitations 
with  the  proposed  technology  are  discussed.  It  is  thought  that  this  new 
method  has  potential  for  reconciling  conflicting  results  in  self-report 
change  studies  such  as  in  the  job  enrichment  area  where  objective  job 
changes  are  not  always  documented  with  self-report  Pre  and  Post  ratings  of 
job  dimensions. 
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Evaluating  Planned  Organizational  Change:  A  Proposed 
Method  for  the  Assessment  of  Alpha,  Beta,  and  Gamma 
Change  at  the  Individual  and  Group  Level 

I 

Abstract 

^Three  conceptually  different  types  of  change  that  can  occur  with  self-report 
data  were  identified  by  Golembiewski,  Billingsley,  and  Yeager  (1976).  Past 
research  on  this  topic  is  critically  reviewed  and  a  new  technique  is  proposed 
that  is  capable  of  indexing  all  three  types  of  change  at  both  individual  and 
group  levels  of  analysis. 

The  accurate  measurement  of  change  is  crucial  when  longitudinal  designs 
are  used  to  study  impacts  of  interventions.  Statistical  problems  associated 
with  the  general  measurement  of  change  have  been  discussed  (c.f.,  Harris,  1963; 
Combach  §  Furby,  1970).  Golembiewski,  Billingsley,  and  Yeager  (1976),  however, 
identified  a  different  set  of  problems  with  the  measurement  of  change  when 
self-report  data  are  used  as  criteria  in  studies  of  organizational  development 
(OD),  leadership  training,  and  job  enrichment  where  Pre  and  Post  ratings  are 
obtained  from  participants.  They  proposed  that  three  conceptually  different 
types  of  change  car  occur,  and  they  called  these  types  alpha,  beta,  and  gamma 
change.  Offering  a  method  for  assessing  gamma  change,  the  authors  stated  a 
need  for  methodologies  to  assess  alpha  and  beta  change.  Since  the  Golembiewski 
et  al.  (1976)  paper,  Zmud  and  Armenakis  (1978)  and  Lindell  and  Drexler  (1979) 
have  commented  on  methods  for  assessing  all  three  types  of  change. 

In  this  paper  we  will  discuss  problems  with  the  proposed  methodologies 
and  will  offer  a  new  approach  for  assessing  alpha,  beta,  and  gamma  change.  We 
will  begin  with  the  three  definitions  of  change  offered  by  Golembiewski  et  al. 
(1976).  We  then  will  briefly  discuss  the  Zmud  and  Armenakis  (1978)  and  the 
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Lindell  and  Drexler  (1979)  papers  with  regard  to  the  measurement  of  alpha  and 
beta  change.  The  conclusion  of  this  section  will  deal  with  problems  in  the 
measurement  of  gamma  change.  In  the  next  section,  we  will  discuss  a  body  of  new 
research  that  introduces  innovations  that  have  the  potential  for  the  measurement 
of  beta  change.  This  research  focuses  on  response- shift  bias  in  self-report  data. 
The  third  section  of  the  paper  will  contain  discussions  of:  a  method  for  the 
assessment  of  beta  change  at  the  individual  level  that  uses  profile  analysis;  a 
method  for  the  assessment  of  gamma  change;  a  method  for  the  aggregation  of  individ¬ 
ual  change  data  such  that  group  level  change  can  be  examined;  and  some  limitations 

r 

with  the  proposed  methods. 

Review  of  the  Problem  and  Previously  Suggested  Solutions 
Golembiewski  et  al.,  in  their  discussion  of  Pre  and  Post  intervention  self- 
report  data,  defined  alpha,  beta,  and  gamma  change  as  follows:  "Alpha  change 
involves  a  variation  in  the  level  of  some  existential  state,  given  a  constantly 
calibrated  measuring  instrument  related  to  a  constant  conceptual  domain.  Beta 
change  involves  a  variation  in  the  level  of  some  existential  state,  complicated 
by  the  fact  that  some  intervals  of  the  measurement  continuum  associated  with  a 
constant  conceptual  domain  have  been  recalibrated.  Gamma  change  involves  a  re¬ 
definition  or  reconceptualization  of  some  domain,  a  major  change  in  the  perspec¬ 
tive  or  frame  of  reference  within  which  phenomena  are  perceived  and  classified, 
in  what  is  taken  to  be  relevant  in  some  slice  of  reality",  (1976,  pp.  134-135). 

With  regard  to  self-report  data,  alpha  change  represents  an  unbiased  measure  of 
variation  in  some  state  between  Time  1  (Tj)  and  Time  2  (T2)  where  the  participant's 
report  of  change  is  taken  on  a  constantly  calibrated  instrument.  Beta  change 
refers  to  an  observed  variation  in  some  state  where  the  apparent  change  is  due  to 
an  instrument  that  has  been  recalibrated  by  the  participant  between  Tj  and  T2 
assessments.  This  threat  to  internal  validity  has  been  called  instrumentation 
by  Campbell  and  Stanley  (1963).  To  the  extent  that  beta  change  has  occurred. 
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comparisons  of  Pre  and  Post  intervention  measures  will  present  a  biased  measure 
of  the  intervention.  That  is,  observed  differences  between  Pre  and  Post  measures 
reflect  an  unknown  amount  of  true  change  and  an  unknown  amount  of  change  due  to 
instrumentation.  Gamma  change  refers  to  a  redefinition  or  reconceptualization 
by  the  participant  of  the  phenomenon  that  is  measured  between  Tj  and  T£  assess¬ 
ments.  To  the  extent  that  gamma  change  has  occurred,  it  may  be  misleading  to 
compare  differences  in  Pre  and  Post  intervention  self-report  data.  As  an  illus¬ 
tration  of  the  three  types  of  change,  suppose  that  leader  supportiveness  is 
measured  on  a  20  point  scale  and  that  the  mean  on  the  Pre  measure  is  13  whereas 
the  mean  on  the  Post  measure  is  14  (assume  that  higher  scores  indicate  increased 
supportiveness).  Golembiewski  et  al.  (1976)  suggest  that  this  change  of  .one  unit 
could  reflect  an  actual  although  slight  increase  in  supportiveness,  which  would 
be  alpha  change;  an  unknown  amount  of  change  in  either  direction  if  the  scale 
values  have  been  recalibrated  such  as  due  to  a  change  in  the  positive  or  negative 
end  points  used  by  participants  when  making  a  response,  which  would  be  beta  change; 
or  a  change  in  the  participants'  conceptualization  of  the  construct  of  leader 
supportiveness,  which  would  be  gamma  change.  Furthermore,  it  is  possible  that 
more  than  one  type  of  change  could  occur  as  a  result  of  an  intervention. 

Obviously,  it  is  important  to  understand  which  type  of  change  has  occurred  if  the 
effects  of  interventions  are  to  be  unambiguously  examined.  In  the  sections  that 
follow,  limitations  with  previously  proposed  methods  for  determining  which  type 
of  change  has  occurred  will  be  reviewed. 

Zmud  and  Armenakis  (1978)  suggested  that  alpha  and  beta  change  can  be  differ¬ 
entiated  when  Pre  and  Post  ratings  are  collected  both  on  actual  criterion  levels 
and  ideal  criterion  levels.  Through  comparison  of  actual  scores,  ideal  scores, 
and  differences  between  actual  and  ideal  scores,  they  maintain  it  is  possible  to 
infer  alpha  or  beta  change,  assuming  no  gamma  change.  Basically,  the  logic  is 
that  if  ideal  scores  have  changed,  then  participants  have  recalibrated  the 
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measurement  scale.  Examination  of  difference  scores  will  clarify  whether  only 
beta  change  has  occurred  or  if  both  alpha  and  beta  change  have  occurred. 

There  are  several  problems  with  this  method.  First,  just  because  Post 
intervention  scores  suggest  that  the  actual  is  now  closer  to  the  ideal,  it  does 
not  mean  that  the  intervention  has  had  the  intended  effect.  If  the  measurement 
scale  has  been  recalibrated,  i.e.,  beta  change,  we  do  not  know  if  a  Pre  inter¬ 
vention  difference  of  10  points  can  be  compared  to  a  Post  intervention  difference 
of  6  points.  Due  to  possible  recalibration  of  scale  end  points  or  of  intervals 
between  the  end  points,  the  Post  intervention  difference  may  indicate  movement 
toward  the  ideal,  away  from  the  ideal,  or  no  movement  at  all.  Second,  because 
participants  are  likely  to  indicate  that  the  ideal  belongs  toward  the  positive 
end  of  the  scale,  it  may  be  difficult  to  obtain  a  statistically  significant  dif¬ 
ference  between  ideal  scores  due  to  a  ceiling  effect.  This  statistical  problem 
is  important  because  according  to  Zmud  and  Armenakis  (1978) ,  beta  change  cannot 
be  detected  unless  there  first  is  a  difference  between  ideal  scores.  Finally, 
these  same  authors  suggested  that  examination  of  actual  scores  and  difference 
scores  would  indicate  whether  alpha  change,  beta  change  or  both  have  occurred. 
Wall  and  Payne  (1973),  however,  have  shown  that  the  use  of  such  scores  trans¬ 
lates  to  a  situation  where  there  almost  always  has  to  be  a  negative  relationship 
between  actual  scores  and  difference  scores.  This  method  also  assumes  that  a 
difference  score  of  2  points,  for  example,  has  the  same  meaning  regardless  of 
whether  it  exists  at  the  positive  or  negative  end  of  the  scale.  Imparato  (1972) 
has  shown  that  this  assumption  is  probably  incorrect. 

These  reasons,  in  addition  to  several  others  mentioned  by  Zmud  and  Armenakis 
(1978),  limit  the  utility  of  actual,  ideal,  and  difference  scores  for  the  deter¬ 
mination  of  whether  alpha  and  beta  change  have  occurred,  and  for  the  assessment 
of  the  impact  of  the  intervention. 

Lindell  and  Drexler  (1979)  also  commented  on  the  Golembiewski  et  al.  (1976) 
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paper  and  questioned  the  severity  of  the  beta  change  problem.  They  argue  that 
if  beta  change  occurs  when  intervals  of  the  measurement  continuum  have  been  re¬ 
calibrated,  then  the  solution  is  to  use  psychometrically  reliable  scales  consist¬ 
ing  of  multiple  items  with  behavioral  anchors.  This  will  reduce  the  likelihood 
of  instrumentation  bias,  or  beta  change,  from  occurring.  According  to  Lindell  and 
Drexler,  the  problem  of  beta  change  is  avoided  when  psychometrically  sound  scales 
are  employed. 

We  question  this  conclusion.  Lindell  and  Drexler  (1979)  did  not  offer  a 
method  for  assessing  beta  change  should  it  exist.  Rather,  they  defined  beta  change 
as  a  non-problem  when  appropriate  scales  are  used.  But,  without  a  technique  for 
assessing  beta  change,  there  is  no  empirical  way  to  test  Lindell  and  Drexler* s 
assertion.  Although  we  agree  with  the  importance  of  psychometrically  sound  rating 
scales,  proof  by  assertion  in  the  absence  of  data  is  not  an  acceptable  method  for 
resolving  problems  with  beta  change.  Furthermore,  in  a  later  section  we  will 
present  data  that  we  believe  show  the  occurrence  of  beta  change  even  when  scales 
of  the  type  recommended  by  Lindell  and  Drexler  are  used.  We  also  note  that 
Golembiewski,  et  al.  essentially  are  saying  that  with  regard  to  beta  change,  there 
is  no  such  thing  as  a  truly  "psychometrically  sound"  scale. 

We  now  will  turn  to  gamma  change.  Gamma  change  refers  to  a  reconceptualiza¬ 
tion  of  the  phenomenon  as  a  result  of  the  intervention.  Golembiewski  et  al.  (1976) 
demonstrated  that  gamma  change  might  be  indicated  through  comparison  of  factor 
structures  over  time.  Suppose  that  assessments  are  taken  at  T^,  T£,  and  T^  and 
that  the  intervention  occurred  between  and  T^.  If  gamma  change  occurred  and 
beta  change  did  not,  then  coefficients  of  congruence  between  T^  and  T£  factor 
structures  should  be  high  whereas  coefficients  of  congruence  between  T^  and  T^ 
and  between  T^  and  T^  factor  structures  should  be  substantially  reduced.  If 
there  is  high  congruence  among  all  pairs  of  factor  structures,  then  gamma  change 
is  not  thought  to  have  occurred. 

Again,  we  believe  that  there  are  problems  with  this  method  for  assessing 
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gamma  change  even  though  Zmud  and  Armenakis  (1979)  endorsed  the  factor  congruency 
approach.  First,  the  method  assumes  that  beta  change  did  not  occur.  In  the 
absence  of  a  technique  for  assessing  beta  change,  the  use  of  this  factor  analytic 
method  for  assessing  gamma  change  is  extremely  questionable.  Lindell  and  Drexler 
(1979)  raised  a  second  problem.  They  correctly  argued  that  either  alpha  or 
beta  change  for  a  subset  of  subjects  could  produce  changes  in  factor  structures. 
Consequently,  we  may  incorrectly  conclude  that  gamma  change  has  occurred  when 
the  change  in  factor  structures  was  due  to  alpha  or  beta  change.  Third,  there 
is  at  present  no  statistical  test  that  allows  for  the  determination  of  reliable 
differences  in  coefficients  of  congruence.  Finally,  factor  analysis  and  similar 
multivariate  procedures  require  that  the  ratio  of  participants  to  items  be  at 
least  as  large  as  3:1  for  any  type  of  reliability  to  be  obtained.  This  ratio  is 
a  liberal  rule  of  thumb.  The  implication  of  this  is  that  factor  analysis  can  not 
be  used  in  studies  where  few  participants  are  involved  compared  to  the  number  of 
items  in  the  scale.  Although  some  OD  related  interventions  involve  large  numbers 
of  participants,  Porras  and  Berg  (1978)  found  that  46%  of  the  studies  they  reviewed 
had  N's  of  100  or  less  and  that  58%  of  the  studies  focused  on  the  individual  as 
the  level  of  analysis.  Thus,  the  use  of  factor  analysis  is  limited  even  if  it 
were  appropriate  from  a  statistical  viewpoint. 

It  also  may  be  useful  to  raise  a  new  issue  that  concerns  the  level  of  analy¬ 
sis  at  which  change  is  measured.  Implicit  in  the  papers  of  Golembiewski  et  al. 
(1976),  Zmud  and  Armenakis  (1978)  and  Lindell  and  Drexler  (1979)  is  the  examination 
of  change  through  differences  in  group  means  or  factor  structures.  Control 
groups  or  baseline  data  are  used  for  this  purpose  of  comparison.  We  propose 
that  in  addition  to  knowledge  of  group  differences,  it  is  important  to  assess 
individual  change  at  the  level  of  the  individual.  Although  we  realize  that  this 
limits  anonymity  of  responses,  lack  of  attention  to  individual  change  may  confuse 
interpretation  of  group  effects.  First,  relatively  large  changes  in  means  for  a 
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few  individuals  could  lead  to  the  conclusion  that  the  intervention  had  a  group 
effect  and  was  successful.  But,  in  reality  the  intervention  was  only  partially 
successful.  Thus,  later  measurement  of  group  process  and  outcome  data  may  reflect 
little  change  even  though  self-report  group  level  data  suggest  that  a  change  has 
taken  place.  Second,  in  groups  composed  of  organizational  members  from  different 
levels  of  the  organization,  it  may  be  important  to  know  among  which  members  a 
change  occurred.  If  changes  occurred  only  among  key  members,  then  this  may  be 
reflected  in  other  group  level  process  and  outcome  changes.  Third,  examination 
of  change  at  the  individual  level  will  allow  for  determination  of  differential 
effects  due  to  the  intervention.  Some  people  may  show  beta  change  whereas  others 
may  show  gamma  change.  Group  level  data  would  mask  these  differential  effects. 
Finally,  knowledge  of  individual  change  may  be  more  useful  in  feedback  discussions 
of  survey  data.  This  would  allow  for  a  more  focused  examination  and  comprehension 
of  the  effects  of  the  intervention.  Although  problems  exist  with  the  aggregation 
of  data  and  the  examination  of  group  effects  is  important,  reliance  only  on  group 
level  data  may  be  misleading. 

Research  on  Response  Shift  Bias 

In  a  series  of  12  studies,  Howard  and  his  colleagues  (Howard,  et  al.,  1979; 
Howard  &  Dailey,  1979;  Howard,  Schmeck  5  Bray,  1979;  Howard,  Dailey  &  Gulanick, 
Note  1;  Howard,  Millham,  Slaten  §  O'Donnell,  Note  2;  and  Bray  §  Howard,  Note  3) 
have  demonstrated  that  experimental  interventions  that  employ  self-report  measures 
of  Pre  and  Post  intervention  ratings  are  subject  to  an  instrumentation-related 
source  of  contamination  known  as  response- shift  bias.  The  difficulty  arises 
when  the  experimental  intervention  changes  the  subject's  evaluation  standard 
with  regard  to  the  dimension  measured  using  the  self-report  instrument.  The 
definition  of  response-shift  bias  is  similar  to  the  Golembiewski  et  al.  (1976) 
conception  of  beta  change,  and  Howard  et  al.  (1979)  note  that  when  response- 
shifts  occur,  even  true  experimental  designs  (Campbell  §  Stanley,  1963)  are 
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incapable  of  providing  an  unbiased  estimate  of  treatment  effects,  i.e.,  alpha 
change. 

An  example  of  a  response- shift  might  prove  enlightening.  A  workshop  parti¬ 
cipant  might  believe  at  pretest  that  she  is  an  "average"  leader.  But,  the  inter¬ 
vention  expands  her  conception  of  the  range  in  leadership  effectiveness  that  can 
occur.  Consequently,  at  posttest  she  believes  that  her  pretest  level  of  function¬ 
ing  was  really  "below  average".  If  she  now  rates  herself  as  "average"  at  post¬ 
test  we  might  erroneously  conclude  that  the  workshop  had  no  effect. 

In  all  investigations  conducted  by  Howard  and  his  colleagues  to  date,  re¬ 
sponse-shifts  have  served  to  increase  the  accuracy  of  subjects'  ability  to  rate 
themselves  after  the  intervention.  Subjects  report  increased  insight  into  their 
real  level  of  functioning  and  state  that  this  resulted  from  their  intervention 
experiences.  Subjects  reliably  conclude  at  posttest  that  their  pre-intervention 
ratings  were  inaccurate. 

Howard  and  his  colleagues  recommend  that  at  the  post-intervention  session 
subjects  be  asked  to  respond  to  each  item  on  the  self-report  measure  twice. 

First  they  are  to  report  how  they  perceive  themselves  to  be  at  present.  This 
corresponds  to  the  usual  post-intervention  assessment.  Immediately  after  answer¬ 
ing  each  item  in  this  manner,  they  are  to  answer  the  same  item  again,  only  this 
time  in  reference  to  how  they  now  perceive  themselves  to  have  been  just  before 
the  workshop  was  conducted.  This  new  assessment  has  been  labeled  the  "Then" 
measure  by  Howard.  The  difference  between  Pre  and  Then  self-report  ratings  is 
called  the  response- shift.  Howard  et  al.  (1979)  suggest  that  changes  in  treatment 
subjects'  standards  of  measurement  are  responsible  for  resoonse-shift  effects. 
Because  Then  and  Post  ratings  are  made  in  close  proximity,  if  Howard  et  al.  (1979) 
are  correct,  it  is  more  likely  that  both  ratings  will  be  made  from  the  same  per¬ 
spective,  and  thus  be  free  of  response-shift  bias  or  beta  change. 

When  considering  the  impact  of  response-shift  bias  and  the  use  of  retro- 
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spective  measures  to  attenuate  this  source  of  bias,  two  issues  become  salient. 

First,  does  the  Then/Post  approach  provide  a  substantially  different  set  of 
conclusions  about  the  effectiveness  of  an  intervention  than  does  the  traditional 
self-report  Pre/Post  approach?  Several  studies  by  Howard  and  his  colleagues 
revealed  significant  Then/Post  treatment  effects  whereas  Pre/Post  analysis  produced 
nonsignificant  results.  There  have  been  no  instances  where  Pre/Post  analyses  pro¬ 
duced  significant  results  while  Then/Post  analyses  produced  nonsignificant  results. 
However,  differing  conclusions  about  the  value  of  interventions  are  not  always  the 
case.  Ralph  (1975;  Pilot  Studies  1  and  2)  reported  nonsignificant  intervention 
effects  whether  Pre/Post  or  Then/Post  self-report  ratings  were  employed,  whereas 
significant  self-reported  intervention  effects  were  found  by  Howard,  Schmeck,  and 
Bray  (1979);  Howard  and  Dailey  (1979);  and  Howard,  Dailey,  and  Gulanick  (Note  1) 
using  both  the  Pre/Post  and  Then/Post  approaches.  Therefore,  in  five  of  the  eleven 
studies  to  date  where  direct  comparisons  between  Pre/Post  and  Then/Post  approaches 
could  be  made,  the  Then/Post  analysis  yielded  a  drastically  different  set  of  con¬ 
clusions  regarding  the  effectiveness  of  the  intervention  from  the  Pre/Post  approach. 

A  second  issue  to  be  considered  is  which  method  provides  the  more  valid 
results.  In  five  separate  analyses  of  the  impact  of  intervention  procedures  rang¬ 
ing  across  assertiveness  training,  interview  skills  training,  helping  skills 
training,  and  interpersonal  effectiveness  training  (Howard  et  al.,  1979;  Howard 
§  Dailey,  1979;  Howard,  Dailey  §  Gulanick,  Note  1),  the  results  from  the  Then/ 

Post  measurement  approach  were  more  similar  to  objective  ratings  of  change  in 
subject  behavior  and  performance  than  were  the  results  obtained  from  traditional 
Pre/Post  self-report  methods.  In  a  study  investigating  actual  changes  in  amount 
of  material  acquired  in  a  college  course,  Then/Post  self-reports  of  content 
learned  reflected  more  accurately  the  students'  actual  mastery  than  did  the  Pre/ 
Post  self-report  approach  (Howard,  Schmeck  &  Bray,  1979).  In  estimating  parti¬ 
cipants'  change  in  an  assertiveness  workshop,  Then/Post  ratings  were  more  in 
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agreement  with  the  facilitator's  ratings  of  changes  than  was  the  Pre-Post  self- 
report  approach  (Howard,  Millham,  Slaten  6  O'Donnell,  Note  2).  Finally,  Bray 
and  Howard  (Note  3)  evaluated  a  workshop  designed  to  improve  the  teaching  skills 
of  a  group  of  teachers.  The  correlation  of  Then/Post  ratings  of  change  with 
changes  from  before  to  after  the  workshop  in  independent  judges'  ratings  of  the 
teachers'  skills  was  significantly  higher  than  the  correlation  of  teachers' 
Pre/Post  self-ratings  of  change  with  the  changes  in  the  judges'  ratings.  Overall, 
in  no  study  comparing  Then/Post  and  Pre/Post  self-report  methods  was  the  Pre/Post 
measure  superior  or  even  equivalent  to  the  Then/Post  approach  in  reflecting 
behavioral  indices  of  change. 

An  investigation  of  several  potential  concerns  about  the  adequacy  of 
retrospective  pre- intervent ion  measures  has  been  undertaken.  Perhaps,  at  the 
time  of  the  posttest,  subjects  have  a  distorted  memory  of  their  pre- intervent ion 
level  of  functioning  and  Pre/Then  differences  are  due  simply  to  systematic 
memory  biases.  Study  5  of  Howard  et  al.  considered  this  possibility  in  the 
evaluation  of  a  semester-long  communication  skills  training  program.  Immed¬ 
iately  after  completing  Post  and  Then  self-report  ratings,  subjects  were 
asked  to  recall  their  Pre  scores  (Memory).  Mean  memory  ratings  were  virtually 
identical  to  Pre  ratings,  but  significantly  different  from  the  Then  scores, 
suggesting  that  the  response-shift  reflects  something  more  than  mere  systematic 
memory  distortions.  Subsequent  interviews  with  subjects  revealed  that  many 
had  an  uncanny  recollection  of  their  pre- intervention  responses  and  that  they 
now  saw  their  pre-test  responses  as  inaccurate  reflections  of  their  pre¬ 
intervention  level  of  functioning.  Subjects  were  typically  aware  that  their 
retrospective  ratings  provided  a  differing  picture  of  their  pre- intervention 
levels  of  functioning  than  their  self-report  pre -intervent ion  responses,  and 
volunteered  explanations  of  why  they  believed  their  pre-ratings  to  be  inaccurate. 
Similar  findings  regarding  memory  distortion  effects  were  also  reported  by 
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Howard  and  Dailey  (1979). 

A  related  concern  is  the  effect  of  retrospective  Then  ratings  on  Post 
ratings.  Theoretically,  there  is  little  rationale  for  why  the  Post  rating  should 
be  altered  by  asking  for  the  Then  rating,  but  it  is  a  potential  methodological 
issue.  Two  studies  were  conducted  that  examined  the  effect  of  Then  ratings  on 
Post  ratings  (Howard  et  al.,  1979;  Howard,  Daily  and  Gulanick,  Note  1).  Subjects 
given  the  same  intervention  treatment  were  asked  at  posttest  to  provide  Post  ratings 
or  Post  and  Then  ratings.  The  results  of  both  studies  indicated  that  Post  ratings 
were  unaffected  by  asking  for  a  Then  rating. 

Response- style  biases  may  be  another  factor  that  could  limit  the  utility  of 
retrospective  Then  ratings.  One  might  hypothesize  that  the  potential  for  this 
contamination  is  heightened  when  a  subject  is  asked  to  provide  both  pre-intervention 
and  post-intervention  ratings  of  his/her  level  of  functioning  at  one  point  in  time. 

To  examine  this  question,  Howard,  Millham,  Slaten  and  O'Donnell  (Note  2)  investigated 
the  operation  of  social  desirability  and  impression  management  response  bias  on 
retrospective  measures  of  assertiveness.  The  correlations  of  social  desirability 
scores  with  pre- intervent ion  self-reports  of  assertiveness  were  higher  than  those 
obtained  between  social  desirability  and  retrospective  self-reports  on  the  same 
measure.  It  would  appear,  within  the  context  of  the  intervention  procedures  employed 
in  the  Howard,  Millham,  Slaten  and  O'Donnell  (Note  2)  study,  that  social  desira¬ 
bility  responding  was  actually  diminished  when  utilizing  the  retrospective 
methodology.  Additionally,  a  bogus  pipeline  procedure  was  employed  for  half 
of  the  treatment  subjects  in  this  study.  Bogus  pipelines  are  techniques  wherein 
subjects  are  led  to  believe  that  the  experimenter  is  able  to  determine  the 
veracity  of  their  self-report  responses.  Millham  and  Kellogg  (in  press)  have 
demonstrated  that  differences  between  scores  obtained  under  bogus  and  non-bogus 
pipeline  conditions  reflect  the  operation  of  impression  management  and  hence 
should  be  indicative  of  attempts  to  meet  implicit  task  demands  to  demonstrate 
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improvement  to  the  evaluator.  The  shifts  in  self-report  measures  of  pre¬ 
intervention  states  that  occur  using  the  retrospective  methodology  were  found 
to  be  no  different  when  obtained  under  bogus  pipeline  and  under  non-bogus  pipe¬ 
line  conditions.  The  results  did  not  support  the  hypothesis  of  greater  bias, 
and,  in  fact,  were  consistent  with  an  interpretation  of  reduced  bias,  on  retro¬ 
spective  measures.  Obvoiusly,  further  studies  in  this  domain  are  needed. 

However,  it  should  be  noted  that  continued  findings  demonstrating  that  eval¬ 
uations  using  retrospective  ratings  show  greater  concurrent  validity  with 
objective  and  behavioral  indices  than  traditional  self-report  approaches  might 
render  the  contamination  issue  moot. 

Finally,  it  is  our  belief  that  the  scales  used  to  assess  Pre,  Post  and 
Then  measures  in  the  studies  by  Howard  and  his  colleagues  had  psychometric 
features  recommended  by  Lindell  and  Drexler  (1979).  Recall  that  Lindell  and 
Drexler  stated  that  beta  change  would  be  unlikely  to  occur  when  psychometrically 
sound  scales  with  behavioral  anchors  are  used  to  obtain  Pre  and  Post  ratings. 

This  appears  not  to  be  the  case. 

Method  for  Assessing  Alpha,  Beta,  and  Gamma  Change:  A  Proposal 

This  section  will  begin  with  a  brief  discussion  of  the  methodological  impli¬ 
cations  of  the  work  by  Howard  and  his  colleagues  on  the  response-shift.  Next, 
we  will  present  methodologies  for  assessing  alpha,  beta,  and  gamma  change  at  the 
level  of  the  individual.  Finally,  we  will  conclude  with  suggested  approaches  for 
aggregating  individual  level  change  data  to  group  level  change  data  such  that 
comparisons  between  intervention  and  control  groups  can  be  made. 

The  methodological  implications  of  the  work  by  Howard  and  his  colleagues 
for  assessing  group  level  alpha  and  beta  change  are  rather  straightforward.  Pre 
measures  are  taken  before  the  intervention  and  both  Post  and  retrospective  Then 
measures  are  taken  after  the  intervention.  Comparable  Pre,  Post,  and  Then  measures 
are  obtained  from  a  control  group.  If  the  group  mean  on  the  Pre  measure  is  differ- 
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ent  from  the  group  mean  on  the  Then  measure  for  the  intervention  group  while  no 
difference  between  these  measures  is  found  for  the  control  group,  it  is  concluded 
that  beta  change  has  occurred  in  the  intervention  group.  Alpha  change  is  then  de¬ 
fined  as  the  difference  between  Post  and  Then  group  means.  When  no  evidence  of 
beta  change  is  found,  alpha  change  could  be  assessed  by  comparing  both  Pre  and 
Post  measures  and  Post  and  Then  measures.  It  should  be  noted  that  Howard's  research 
suggests  that  even  when  there  are  no  significant  differences  between  Pre  and  Then 
measures,  the  correlations  between  Post  and  Then  score  differences  and  objective 
measures  of  change  are  greater  than  correlations  between  Post  and  Pre  score  dif¬ 
ferences  and  objective  measures  of  change.  This  does  not  mean,  however,  that  Pre 
ratings  no  longer  need  to  be  collected,  Pre  ratings  are  necessary  for  the  identi¬ 
fication  of  beta  change  and,  as  will  be  shown  later,  for  the  identification  of 
gamma  change.  Rather,  unless  there  is  strong  reason  to  suspect  Then  scores,  Howard 
simply  recommends  that  greater  reliance  be  placed  on  results  from  Post  and  Then 
self-report  measures  in  studies  of  change.  But,  at  present  there  is  no  reason  to 
suspect  Then  scores  except  in  situations  where  it  is  to  the  participants'  obvious 
advantage  to  give  false  Then  responses,  where  participants  are  confused  as  to  the 
instructions,  or  where  participants  in  a  no-treatment  control  group  are  asked  to 
give  Post  and  Then  ratings  within  a  few  hours  or  days  of  the  Pre  ratings. 

Whereas  Howard's  procedure  for  studying  the  response  shift  and  for  assessing 
alpha  and  beta  change  represents  a  considerable  improvement  over  the  suggested 
approaches  of  Zmud  and  Armenakis  (1978)  and  Lindell  and  Drexler  (1979),  a  differ¬ 
ent  approach  based  upon  profile  analysis  offers  an  alternative  that  may  be  prefer¬ 
able,  primarily  because  it  focuses  attention  on  the  individual.  Also,  the  work  of 
Howard  and  his  colleagues  has  not  to  date  considered  gamma  change  or  its  measure¬ 
ment.  We  propose  that  alpha,  beta,  and  gamma  changes  can  be  described  at  the 
individual  and  the  group  level  with  Pre,  Post,  and  Then  ratings  that  are  examined 
using  profile  analysis. 

Profile  analysis  is  a  method  for  examining  differences  between  two  patterns 
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of  scores  on  the  same  set  of  items  or  scales.  Various  forms  of  profile  analysis 
have  been  used  in  such  diverse  areas  as  the  classification  of  people  as  schizo¬ 
phrenic  or  normal  (Klett  $  Pumroy,  1972)  and  the  classification  of  organizations 
as  Systems  I,  II,  III  or  IV  (Likert,  1967). 

A  basic  issue  in  profile  analysis  concerns  the  appropriate  method  for  deter¬ 
mination  of  the  degree  to  which  two  profiles  are  similar  or  dissimilar.  Nunnally 
(1978)  states  that  pairs  of  profiles  can  be  compared  according  to  their  level, 
their  shape,  and  their  dispersion.  Level  refers  to  the  mean  of  scores  on  all 
items  in  the  profile.  Two  profiles  are  similar  in  level  if  the  mean  of  item 
scores  in  one  profile  is  not  significantly  different  from  the  mean  of  item  scores 
in  the  other  profile.  Two  profiles  are  similar  in  shape  if  the  correlations 
between  the  two  profiles  are  positive  and  statistically  significant  from  zero. 
Finally,  two  profiles  are  similar  in  dispersion  if  the  standard  deviation  of 
item  scores  in  one  profile  is  not  significantly  different  from  the  standard  devia 
tion  of  item  scores  in  the  other  profile.  Confusion  over  the  interpretation  of 
similarity  exists  because  two  profiles  can  have:  (1)  similar  levels  yet  be  of 
dissimilar  or  even  opposite  shapes;  (2)  different  levels  but  similar  shapes;  or 
(3)  similar  levels  and  shapes  but  different  dispersions.  These  examples  are  only 
three  of  many  combinations  that  could  occur  and  graphic  representations  of  these 
three  possibilities  are  presented  in  Table  1. 


Insert  Table  1  about  here 


We  propose  that  alpha,  beta,  and  gamma  change  for  any  individual  in  an 
intervention  or  control  group  can  be  identified  and  measured  through  the  selective 
comparison  of  profiles  for  Pre,  Post,  and  Then  ratings  made  by  that  individual 
to  a  set  of  items  that  make  up  a  single  construct  or  dimension.  We  will  discuss 
beta  change  first.  Once  the  existence  of  beta  change  at  the  level  of  the  indi- 
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vidual  has  been  determined,  the  assessment  of  alpha  change  is  relatively  straight 
forward.  We  will  conclude  with  a  discussion  of  gamma  change. 

Howard's  work  suggests  that  a  retrospective  Then  measure  taken  at  the  same 
time  as  the  Post  measure  should  be  on  the  same  recalibrated  scale  as  the  Post 
measure.  Consequently,  any  recalibration  by  an  individual  due  to  beta  change 
should  primarily  affect  the  level  of  that  individual's  profile  and  not  the  shape 
or  dispersion.  Strictly  speaking,  this  assumes  that  beta  change  has  occurred  to 
a  similar  extent  for  every  item  in  the  unidimensional  scale.  This  may  be  impos¬ 
sible  to  determine,  but  in  practice  it  should  not  be  a  concern  if,  and  this  is 
important,  multiple  items  are  used  to  assess  the  same  underlying  dimension. 
Therefore,  beta  change  would  be  reflected  by  a  difference  between  an  individual 
mean  score  across  all  items  on  the  Pre  measure  and  his/her  mean  score  across  all 
items  on  the  Then  measure.  A  dependent  t-test  where  the  N  is  based  on  the  number 
of  items  or  scale  scores  in  the  profile  provides  a  descriptive  index  of  the  beta 
change  for  the  individual.  It  may  be  unwise,  however,  to  refer  the  observed  t- 
value  to  a  table  of  critical  values  in  a  t-distribution  in  an  attempt  to  assess 
the  significance  of  change  for  the  individual  because  the  assumption  of  indepen¬ 
dence  of  observations  is  violated,  that  is,  all  measures  come  from  the  same  indi¬ 
vidual.  We  will  discuss  the  implications  of  this  violation  in  a  later  section. 

In  any  event,  the  t-value  for  the  individual  provides  a  useful  description  of 
change  at  the  individual  level  and  can  also  serve  as  an  index  number  through 
which  group  level  change  can  be  investigated,  as  will  be  discussed  later.  Alpha 
change  can  be  examined  in  a  similar  manner  by  calculating  a  dependent  t-value 
between  profile  means  for  the  Post  and  the  Then  measures  for  an  individual. 

Again,  this  value  should  be  judged  descriptively,  because  of  the  violation  of 
the  assumption  of  independence. 

It  might  be  useful  to  pause  here  and  reflect  on  how  to  interpret  the 
results  of  an  inferential  test  statistic  in  a  descriptive  and  not  inferential 
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manner.  We  are  hesitant  to  encourage  overinterpretation  of  the  proposed  methodology. 
Because  violations  of  test  statistic  assumptions  are  of  unknown  consequence  for 
the  biasedness  of  the  test,  we  do  not  believe  that  obtained  probability  values  should 
be  taken  at  face  value.  Such  caution,  however,  does  not  irrevocably  reduce  the 
utility  of  such  tests.  Probability  values  are  one  kind  of  aid  to  interpretation 
and  there  is  nothing  sacrosanct  about  the  .05  level.  A  correlation  of  r  =  .03 
based  on  a  sample  of  100,000  may  be  statistically  significant  yet  descriptively 
speaking  it  describes  a  weak  relationship.  Similarly,  if  we  find  a  t-value  of  6.50 
for  the  difference  in  a  person's  Post  and  Then  profile  means  and  a  t-value  of  .05 
for  the  difference  in  the  same  person's  Pre  and  Then  profile  means,  then  descrip¬ 
tively  speaking  there  appears  to  be  much  greater  evidence  of  alpha  change  than  beta 
change.  We  would  prefer  to  be  able  to  offer  unbiased  inferential  test  statistics; 
however,  being  unable  to  discover  any  such  tests,  we  are  willing  to  trust  the 
researcher  to  examine  the  data  and  to  openly  discuss  and  interpret  the  results. 

Such  descriptive  reporting  is  not  without  precedence  and  is  often  used  in  N=1 
multiple  baseline  studies  on  learning  and  behavior  change. 

Determination  of  individual  level  gamma  change  is  less  straightforward  than 
for  alpha  or  beta  change.  Nevertheless,  we  believe  it  is  possible  to  specify  the 
conditions  that  would  be  indicative  of  gamma  change.  Golembiewski  et  al.  (1976) 
proposed  that  gamma  change  would  be  reflected  by  a  lack  of  congruence  between 
factor  structures  of  ratings  taken  before  and  after  the  intervention.  Factor 
structures  are  based  upon  the  pattern  of  correlations  or  the  pattern  of  covariances 
among  variables.  We  propose  that  gamma  change  can  be  identified  through  examina¬ 
tion  of  profile  shapes,  i.e.,  correlations,  and  profile  dispersions,  i.e.,  variances 
or  standard  deviations.  This  is  accomplished  through  pairwise  profile  analysis 
of  Pre,  Post,  and  Then  ratings  to  a  set  of  items  that  are  thought  to  measure  a 
single  construct  or  dimension.  An  example  will  demonstrate  how  we  propose  to 
infer  gamma  change  at  the  individual  level.  First,  correlations  between  profiles 
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are  computed  for  a  participant's  Pre  and  Post  measures,  Pre  and  Then  measures,  and 
Post  and  Then  measures.  If  gamma  change  has  occurred,  the  correlation  between  Post 
and  Then  measures  should  be  substantially  greater  than  the  correlations  between  Pre 
and  Post  measures  and  Pre  and  Then  measures.  In  other  words,  the  participant  per¬ 
ceives  the  shape  of  the  profile  or  the  degree  to  which  particular  items  tend  to  "go- 
together"  differently  after  the  intervention  has  occurred.  The  subject  has  recon¬ 
ceptualized  the  dimension  under  investigation.  Thus,  comparison  of  correlations  among 
Pre,  Post  and  Then  profile  shapes  represents  one  descriptive  definition  of  the  ex¬ 
istence  of  gamma  change.  How  similar  or  dissimilar  the  correlations  must  be  before 
the  researcher  concludes  gamma  change  is  difficult  to  specify.  It  would  be  possible 
to  use  the  Hotel ling- Williams  statistic  (Darlington,  1975)  as  a  descriptive  aid,  but 
this  statistic  requires  considerable  computational  effort.  Again,  we  must  trust 
that  the  results  are  openly  presented  and  that  reasonable  judgment  is  taken. 

A  second  operationalization  of  gamma  change  involves  comparison  of  profile  dis¬ 
persions.  We  propose  that  standard  deviations  of  the  items  in  the  profiles  for  Pre, 
Post,  and  Then  scores  be  examined  for  differences  in  dispersion.  If  the  standard 
deviations  of  Post  and  Then  profiles  are  not  different  from  each  other  but  if  each 
is  different  from  the  standard  deviation  of  the  Pre  profile,  we  propose  that  this 
change  in  dispersion  is  another  form  of  evidence  for  a  reconceptualization  of  the 
dimension  under  investigation.  An  inferential  test  for  differences  in  non-indepen¬ 
dent  variances  is  not  widely  known  but  a  test  statistic  does  exist  (Kirk,  1978,  p. 
277).  Again,  however,  the  resulting  statistic  must  be  judged  descriptively  because 
of  the  lack  of  independence. 

Finally,  a  third  operationalization  of  gamma  change  would  consider  the  degree 
to  which  both  profile  shapes  and  dispersions  have  changed  as  a  result  of  the  inter¬ 
vention.  That  is,  it  is  possible  that  profiles  could  differ  in  shape  but  not  disper¬ 
sion,  dispersion  but  not  shape,  or  both  shape  and  dispersion.  We  propose  that  the 
third,  and  perhaps  strongest,  description  of  gamma  change  be  defined  as  when  substan¬ 
tial  differences  exist  for  both  shapes  and  dispersions. 
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The  above  guidelines  for  the  identification  of  individual  level  gamma  change 
are  abstract  not  by  choice,  but  by  the  lack  of  appropriate  test  statistics.  Also, 
many  combinations  other  than  what  we  have  described  could  exist.  For  example, 
correlations  between  all  pairs  of  profiles  could  be  substantially  different  from 
each  other.  We  have  attempted  to  define  the  types  of  relationships  that  should 
exist  if  gamma  change  is  thought  to  have  occurred.  To  the  extent  that  these 
specific  relationships  are  not  found,  then  the  likelihood  of  gamma  change  is 
diminished.  At  a  minimum,  the  proposed  methodologies  at  least  suggest  how  the 
researcher  might  study  alpha,  beta,  and  gamma  change  at  the  level  of  the  indivi¬ 
dual.  No  other  methodologies  have  even  considered  these  three  types  of  change  at 
this  level  of  analysis. 

As  was  stated  earlier,  it  would  be  desirable  to  be  able  to  use  the  recommend¬ 
ed  methods  for  the  analysis  of  group  level  change  data  in  addition  to  the  analysis 
of  individual  change  data.  This  would  be  especially  valuable  when  participants 
are  assigned  to  intervention  versus  control  groups  or  to  different  levels  or 
types  of  interventions.  We  now  turn  to  a  discussion  of  how  individual  data 
might  be  aggregated  to  produce  evidence  of  group  level  change. 

First,  let  us  consider  the  examination  of  alpha  change.  Given  the  existence 
of  beta  change,  alpha  change  at  the  individual  level  is  determined  through  analysis 
of  Post  and  Then  scores.  One  method  of  arriving  at  a  significance  test  to  assess 
group  level  alpha  change  would  be  to  follow  the  procedure  used  by  Howard.  In 
Howard's  approach  an  index  number  for  each  participant  is  calculated  as  the  dif¬ 
ference  between  the  mean  Post  and  the  mean  Then  score  for  that  person.  An  analy¬ 
sis  of  variance  (or  a  t-test  in  the  case  of  two  groups)  is  then  performed,  compar¬ 
ing  the  mean  index  scores  of  different  groups  of  individuals.  Although  this 
approach  is  reasonable,  an  alternative  approach  makes  direct  use  of  the  t-values 
calculated  for  each  individual  as  was  outlined  in  the  section  on  individual  level 
alpha  change.  Using  the  t-value  as  an  index  number,  each  person  in  the  inter- 
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vention  group  will  have  a  t-value  as  will  each  person  in  the  control  group. 

These  t-values  can  be  used  as  the  dependent  variable  in  a  comparison  of  the  differ¬ 
ence  between  the  two  groups.  A  Mann-Whitney  U  test  (two  groups)  or  a  Kruskal- 
Wallis  H  test  (more  than  two  groups)  can  then  be  conducted  on  the  ranked  t-scores. 

A  statistical  difference  between  intervention  and  control  groups  with  greater  t- 
values  in  the  intervention  group  would  suggest  that  the  intervention  had  an  effect. 
Such  a  nonparametric  procedure  is  preferred  because  the  t-values  for  individuals 
should  probably  be  considered  as  ordinal  rather  than  as  interval  data.  The  advan¬ 
tages  of  this  approach  over  Howard's  are  that  it  makes  direct  use  of  individual 
t-values  and  that  the  inclusion  of  a  standard  deviation  term  in  the  denominator 
when  the  t-value  is  computed  controls  for  differential  contraction  or  expansion 
of  the  response  scale  for  that  individual.  Furthermore,  in  contrast  to  individual 
level  alpha  change,  the  nonparametric  procedure  can  be  interpreted  as  an  inferen¬ 
tial  test  statistic,  i.e.,  the  probability  value  is  unbiased. 

Group  level  beta  change  would  be  examined  in  a  similar  manner.  A  Mann-Whitney 
U  test  or  a  Kruskal-Wallis  H  test  would  be  conducted  on  the  ranked  t-values  that 
were  associated  with  each  individual's  comparison  of  Pre  and  Then  profile  means. 
Recall  that  alpha  change  considered  Post  and  Then  profile  means. 

Finally,  examination  of  group  level  gamma  change  would  proceed  in  a 
similar  manner  as  was  used  for  group  level  alpha  and  beta  change.  Only  in  this 
case,  the  scores  or  index  numbers  that  would  be  ranked  for  analysis  with  the  Mann- 
Whitney  U  test  or  the  Kruskal-Wallis  H  test  are  constructed  from  differences  in 
pairs  of  profile  correlations  or  pairs  of  profile  standard  deviations.  Taking 
profile  correlations  first,  we  propose  that  the  researcher  compute  correlations 
between  each  pair  of  Pre,  Post,  and  Then  profiles  for  each  participant  in  both 
the  intervention  and  control  groups.  Thus,  each  individual  has  three  correlations; 
one  for  Pre/Post,  one  for  Pre/Then,  and  one  for  Post/Then.  Recall  that  these  cor¬ 
relations  were  computed  for  individual  level  gamma  change.  Next,  the  raw  differ- 
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ence  score  between  each  pair  of  correlations  is  computed.  If  gamma  change  has 
occurred  as  a  result  of  the  intervention,  then  for  people  in  the  intervention  group 
the  correlation  between  Pre/Post  profiles  should  be  similar  to  the  correlation  be¬ 
tween  Pre/Then  profiles,  i.e.,  a  small  difference  score.  But,  the  correlation  between 
Post/Then  profiles  should  be  greater,  i.e.,  a  large  difference  score,  than  the  other 

two  correlations  Clpost>  Then  ^re.  Post  "iPre,H»en>  •  For  people  in  the  C°ntro1  gr°Up’ 
all  three  correlations  should  be  similar  in  magnitude  (£post  Then*1  ^Pre, Post  =^Pre , 

Then).  In  the  case  of  two  groups,  a  Mann-Whitney  U  test  of  group  differences  would 
be  computed  three  times,  once  on  the  ranked  difference  scores  of  £post  Then  minus  r 
Pre, Post;  once  on  the  ranked  difference  scores  of  Ipost>Then  minus  Ipre.Then’  and 
once  on  the  ranked  difference  scores  of  £pre  post  minus  £pre  Then'  gamma  change 
had  occurred  to  people  in  the  intervention  group  but  not  to  people  in  the  control 
group,  then  significant  Mann-Whitney  U's  should  be  found  between  groups  on  Post/Then 
minus  Pre/Post  and  Post/Then  minus  Pre/Then  correlation  differences.  But,  there 
should  be  no  group  effect  for  differences  between  Pre/Post  minus  Pre/Then  correlation 
score  differences.  In  other  words,  gamma  change  predicts  that  Post/Then  profile 
correlations  should  be  larger  than  Pre/Post  or  Pre/Then  profile  correlations  and  that 
the  latter  two  correlations  should  not  be  different  from  each  other.  The  Mann-Whit¬ 
ney  U  test  looks  at  the  rank  order  of  differences  between  correlations  as  a  function 
of  intervention  versus  control  group  membership. 

Examination  of  group  level  gamma  change  using  profile  dispersions  would  re¬ 
quire  that  for  each  individual  the  raw  difference  score  between  standard  deviations 
for  Pre,  Post,  and  Then  profiles  be  computed  (Postg  D  minus  Theng  D  ;  Postg  d.  m*nus 
Pre0  n  ;  and  Thenc  n  minus  Prec  n  ) .  If  the  scale  is  unidimensional  at  Pre-test 
but  multidimensional  at  Post-test,  then  the  standard  deviations  of  Post  and  Then 
profiles  should  be  larger  than  the  standard  deviation  of  the  Pre  profile.  This  re¬ 
quires  that  at  the  time  of  analysis  all  items  in  the  scale  be  scored  so  that  a  high, 
or  low,  response  value  for  each  item  reflects  a  response  in  the  same  direction. 
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The  next  step  is  to  do  three  Mann-Whitney  U  tests  for  participants  in  both 
intervention  and  control  groups;  once  for  the  ranked  difference  scores  using  Post- 
Then  standard  deviations,  once  for  the^ranked  difference  scores  using  Post-Pre 
standard  deviations,  and  once  for  the  ranked  difference  scores  using  Then-Pre 
standard  deviations.  Gamma  change  in  the  intervention  group  but  not  in  the  con¬ 
trol  group  would  be  reflected  by  a  significant  Mann-Whitney  U  for  Post-Pre  and 
Then-Pre  ranked  differences.  In  other  words,  people  in  the  intervention  group 
should  have  large  differences  between  Post-Pre  and  Then-Pre  standard  deviations 
while  people  in  the  control  group  should  have  small  differences  on  these  profiles. 

For  both  sets  of  difference  scores,  i.e.,  profile  correlations  and  profile  standard 
deviations,  the  probability  values  of  the  Mann-Whitney  U  tests  can  be  interpreted 
at  face  value.  As  with  individual  level  gamma  change,  we  propose  that  the  strongest 
demonstration  of  gamma  change  be  defined  as  when  predictions  for  group  differences 
in  profile  shapes  and  profile  dispersions  are  both  supported. 

Limitations  and  Advantages 

Although  our  proposed  methods  alleviate  many  of  the  problems  associated  with 
other  methods,  there  are  necessarily  limitations  to  our  approach.  First,  given 
our  methodology,  it  is  impossible  to  perform  meaningful  significance  tests  at  the 
individual  level.  The  t-values  proposed  here  are  meant  to  provide  a  descriptive 
measure  of  the  degree  of  change  for  a  particular  individual,  instead  of  providing 
for  statistical  inference.  An  inferential  test  here  would  require  the  assumption 
that  items  on  the  scale  be  independent  of  one  another.  To  the  extent  that  this 
assumption  is  violated,  significance  tests  for  an  individual  are  potentially  quite 
misleading,  because  the  assumption  of  independence  of  observations  is  crucial  for 
the  t-test  (see,  for  example,  Scheffe',  1959) .  Although  the  proposed  approach  does 
not  provide  significance  tests  at  the  individual  level,  none  of  the  methods  pro¬ 
posed  by  previous  investigators  even  considered  a  description  of  change  at  the 
level  of  the  individual.  Our  method  at  least  served  to  direct  attention  toward 
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the  individual  as  well  as  toward  the  individual's  group. 

Fortunately,  the  assumption  of  independence  is  unnecessary  for  assessing  the 
three  types  of  change  at  the  group  level.  Although  t-values  and  difference  scores 
form  the  dependent  variables  in  these  group  analyses,  the  significance  tests  em¬ 
ployed  here  do  not  depend  upon  the  sampling  distributions  of  the  individual  level 
statistics.  We  add,  however,  the  observation  that  computation  of  a  difference 
score  assumes  that  the  numbers  used  to  compute  the  difference  score  are  themselves 
interval  in  nature.  This  clearly  is  not  the  case.  Thus,  it  is  possible  that  dif¬ 
ference  scores  computed  from  raw  correlations  could  result  in  different  Mann- 
Whitney  U  values  than  difference  scores  computed  from  correlations  that  were  first 
transformed  to  z-scores.  In  this  instance,  we  could  have  suggested  the  use  of  the 
Hotelling-Williams  statistic  (see  Darlington,  1975)  but,  the  mathematical  advan¬ 
tages  of  this  test  do  not  merit,  in  our  opinion,  the  extreme  computational  effort 
involved. 

One  further  assumption,  however,  is  important  for  assessing  gamma  change  at 
either  the  group  or  individual  level.  As  previously  stated,  it  is  assumed  that 
items  on  the  scale  being  used  to  measure  change  are  unidimensional.  To  under¬ 
stand  the  necessity  of  this  assumption,  suppose  that  different  items  on  one  scale 
in  fact  measured  two  different  constructs,  and  that  beta  change  (but  no  alpha  or 

gamma  change)  occurred  for  items  measuring  one  construct  but  no  change  of  any  type 

occurred  for  the  other  construct.  In  this  case,  the  Then  and  Post  profiles  will 
correlate  highly  but  the  Pre  and  Then  as  well  as  the  Pre  and  Post  will  not.  How¬ 
ever,  this  is  exactly  the  pattern  of  correlations  that  reflects  gamma  change. 

Thus,  in  this  situation,  what  appears  to  be  gamma  change  by  our  method  is  actually 

beta  change  for  a  subset  of  the  constructs  being  measured  by  the  scale.  On  the 
other  hand,  if  the  scale  truly  is  unidimensional,  this  pattern  of  correlations 
can  arise  only  because  of  gamma  change.  Viewed  in  this  light,  gamma  change 
occurs  when  a  person's  self-perceptions  are  changing,  but  changing  in  different 
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manners  for  different  items. 

A  final  limitation  is  that  by  investigating  three  types  of  change  at  the 
group  level,  a  large  number  of  significance  tests  are  being  conducted.  The  use 
of  an  alpha  level  of  .05,  for  example,  for  each  significance  test  in  a  study  would 
result  in  the  probability  of  at  least  one  Type  1  error  somewhere  in  the  study 
being  much  greater  than  .05.  However,  the  fact  that  the  three  types  of  change 
are  conceptually  distinct  argues  that  each  separate  test  could  be  conceptualized 
as  addressing  a  distinct  hypothesis,  so  that  the  proliferation  of  tests  is  not 
necessarily  a  problem.  If  the  cost  of  Type  1  errors  is  judged  to  be  severe,  a 
conservative  approach  using  the  Bonferroni  method  of  adjusting  the  alpha  level 
for  each  test  could  be  employed. 

Whereas  the  previous  discussion  outlined  the  unresolved  issues  in  this  pro¬ 
posed  approach  to  the  measurement  of  change  (or  any  other  existant  approach) ,  there 
are  at  least  five  decided  advantages  to  the  approach  presented  in  this  paper. 

First,  the  analyses  are  both  ideographic  and  nomothetic  in  nature.  Profile  analyses 
-  are  performed  on  the  data  of  individual  subjects.  Therefore,  differing  types  of 
change  can  be  demonstrated  by  different  individuals.  This  is  in  marked  contrast 
to  the  previous  approaches  to  the  measurement  of  change.  The  data  can  later  be 
combined  for  nomothetic  analyses,  but  the  first  step  is  ideographic. 

A  second  advantage,  which  is  related  to  the  ideographic  analysis,  is  that 
feedback  regarding  the  effectiveness  of  the  intervention  can  be  tailored  to  the 
individual.  That  is,  we  might  inform  one  individual  that  he/she  had  experienced 
a  beta  change  whereas  the  majority  of  the  group  experienced  an  alpha  change.  In 
doing  so  we  might  improve  the  validity  and  usefulness  of  our  feedback  efforts. 
Individually  tailored  feedback  might  be  more  appropriate  for  subsequent  discussions 
of  implications,  suggestions  for  future  interventions  and  the  like. 

A  third  advantage  involves  the  ability  of  the  proposed  method  to  deal  with 
interventions  on  small  samples  of  subjects.  Conceptually,  a  sample  size  of  one 


Organizational  Change 


24 


could  be  handled  by  this  approach,  which  is  in  marked  contrast  with  the  factor 
analytic  approaches  that  require  large  numbers  of  subjects  relative  to  the  number 
of  self-report  items  employed  to  be  considered  reliable.  This  advantage  is  espe¬ 
cially  important  in  OD-type  interventions  where  reduced  sample-sizes  are  typically 
the  rule. 

Fourth,  unlike  previous  attempts  to  understand  change,  the  present  proposal 
allows  investigators  to  look  at  the  three  types  of  change  in  a  more  independent 
manner.  Any  possible  permutation  of  none,  some,  or  all  three  types  of  change  can 
be  found  and  described  for  each  individual  subject  in  the  study. 

The  last  advantage  we  note  is  related  to  this  ability  to  independently  assess 
the  three  types  of  change,  that  is,  researchers  are  now  able  to  obtain  a  much  more 
sophisticated  understanding  of  exactly  what  effects  their  intervention  produced. 

If  alpha  change  occurred  in  a  few  subjects  and  beta  change  in  a  different  subset  of 
individuals,  researchers  might  consider  related  issues,  such  as  amount  of  time  with 
the  organization  or  pre- intervention  level  of  skill  on  the  target  dimensions  as 
clues  regarding  why  the  intervention  had  differential  impacts  on  subgroups  of  par¬ 
ticipants.  Analyses  such  as  these  will  result  in  a  more  multidimensional,  sophis¬ 
ticated  and  precise  understanding  of  what  the  intervention  did  and  did  not  accomplish. 

There  are  additional  consequences  of  the  adoption  of  the  present  approach 
which  cannot  really  be  called  advantages  although  they  might  be  advantageous  in  some 
instances.  Researchers  are  forced  to  closely  examine  their  raw  data  rather  than  re¬ 
lying  on  group  means  based  on  scores  that  are  suimned  over  items.  The  change  agent 
must  have  a  sound  knowledge  of  the  constructs/behaviors  that  are  to  be  the  focus  of 
the  intervention  prior  to  the  intervention  itself.  That  is,  he/ she  needs  a  thorough 
knowledge  of  the  various  scale  dimensions  and  constructs.  Also,  it  is  important 
that  dimensions  and  constructs  be  assessed  with  multiple  items  where  each  item  has 
several  response  alternatives.  Because  t-values  and  correlations  are  computed  on  a 
person’s  profile  scores,  a  profile  consisting  of  only  five  items  with  each  item 
having  only  four  response  alternatives  could  produce  misleading  results.  A  minor 
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change  on  one  item  would  have  a  greater  impact  on  computed  profile  correlations 
when  five  items  are  used  as  opposed  to  say  10  or  15  items.  Similarly,  a  minor 
change  on  one  item  with  four  response  alternatives  would  have  a  greater  impact  than 
a  minor  change  with  20  response  alternatives.  Note  that  the  survey  developed  by 
Likert  (1967)  uses  20  response  alternatives.  Additionally,  whereas  the  amount  of 
testing  time  is  likely  to  be  about  10%  greater  when  Post  and  Then  measures  are 
collected  as  opposed  to  Post  measures  only,  we  believe  this  additional  time  and 
thought  are  well  spent  because  the  subject  is  required  to  focus  directly  upon  the 
phenomenon  of  change. 

Finally,  we  believe  that  the  present  approach  can  be  extended  to  time  series 
data,  although  there  is  some  question  as  to  how  to  operationalize  retrospective 
Then  scores.  For  illustration  purposes,  suppose  that  observations  are  collected  at 
four  different  time  periods  (T^  T2>  Tj,  and  T4)  and  that  the  intervention  occurs 
between  T2  and  T^.  Ratings  are  also  collected  from  a  control  group.  Measures  of 
the  current  status  of  the  phenomenon  would  be  collected  at  each  observation  and 
these  could  be  called  Pre^,  Pre2,  Postj,  and  Post2  using  current  terminology. 
Retrospective  Then  questions  would  be  asked  at  T2,  T^,  and  T^.  There  are  two  ways 
to  ask  the  retrospective  Then  question,  however,  and  at  present  we  have  no  data  on 
which  method  is  to  be  preferred.  One  method  would  ask  the  participant  to  rate  the 
phenomenon  in  reference  to  how  they  currently  perceive  it  to  have  been  at  the  time 
of  the  last  observation.  In  other  words,  the  first  Then  question  refers  to  T^,  and 
the  second  and  third  Then  questions  refer  to  T2  and  T^.  Here,  we  would  predict 
that  there  would  be  no  evidence  at  alpha,  beta,  or  gamma  change  at  either  the  indi¬ 
vidual  or  group  levels  for  data  collected  at  T^  and  T2 .  But,  if  the  intervention 
had  an  effect,  then  evidence  of  this  should  be  seen  in  comparison  of  T2  and  T^ 
ratings.  Consideration  of  T^  and  T4  ratings  becomes  less  clear  as  to  what  to  expect. 
People  in  the  control  group  should  essentially  provide  similar  Pre,  Post  and  Then 
ratings  throughout  all  time  periods.  But,  because  some  interventions  are  designed  to 
provide  the  participant  with  the  capacity  for  continued  change,  then  alpha,  beta, 
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and  gamma  change  could  be  observed  between  T2  and  and  even  between  and  T^. 

That  is,  the  intervention  may  have  both  an  immediate  and  a  continued  effect  on  the 
rate  and  type  of  change  experienced  by  the  participant.  We  propose  that  the  three 
types  of  change  be  examined  for  all  adjoining  pairs  of  ratings,  regardless  of  the 
time  span  between  ratings.  Through  examination  of  individual  and  group  results, 
it  should  be  possible  for  the  researcher  to  describe  and  understand  the  short  and 
long  term  effects  of  the  intervention.  As  with  any  time-series  design,  however, 
the  computational  work  involved  requires  increasing  effort  as  more  data  are  collec¬ 
ted.  Also,  the  ability  to  describe  and  understand  change  as  conceptualized  by 
Golembiewski  et  al.  (1976)  and  as  operationalized  here  may  become  even  more  complex. 

The  second  operationalization  of  the  Then  rating  would  ask  the  participant  to 
rate  the  phenomenon  in  reference  to  how  they  currently  perceive  it  to  have  been  at 
the  time  of  the  first,  and  not  the  most  recent,  observation.  In  other  words.  Then 
ratings  at  T2,  T3,  and  T4  all  would  reflect  back  to  Tj.  With  this  approach,  any 
pair  of  ratings  and  not  only  adjoining  pairs  could  be  analyzed  as  described  earlier. 

We  hesitate  to  continue  with  a  discussion  of  the  proposed  methodology  as  ap¬ 
plied  to  time  series  designs  because  we  have  not  collected  data  with  such  designs 
and  conjecture  at  this  point  may  prove  misleading.  But,  we  heartily  endorse  future 
research  on  this  extension  because  retrospective  Then  ratings  have  been  shown  to 
be  valid  for  periods  of  up  to  one  year  after  the  intervention  (Howard  et  al.,  1979). 
Thus,  the  use  of  such  ratings  may  be  informative  on  planned  change  that  spans  sev¬ 
eral  years. 

Concluding  Remarks 

A  short  decade  ago  Cronbach  and  Furby  (1970)  asked  how  we  should  measure 
change--or  should  we?  Their  conclusion  was  that  the  measurement  of  change  was  a 
complex  and  problematic  endeavor,  and  that  alternatives  to  measuring  change,  such 
as  through  comparison  of  post  intervention  scores  only,  be  seriously  considered 
whenever  possible.  The  work  of  Golembiewski  and  Howard  and  their  colleagues  points 
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out  that  with  self-report  data,  the  measureaent  of  change  is  far  more  complex 
and  problematic  than  even  Cronbach  and  Furby  led  us  to  believe.  Unfortunately , 
it  now  is  apparent  that  Cronbach1 s  and  Furby1 s  recommended  solutions  to  measuring 
change  are  inappropriate  with  self-report  data  (Howard,  Schmeck  S  Bray,  1979; 
Golembiewski  et  al.,  1976),  and  we  are  forced  again  to  measure  change.  This 
paper  proposes  a  thorough  and  systematic  technique  for  assessing  change  at  the 
level  of  the  individual  that  is  cognizant  of  our  increased  sophistication  regard¬ 
ing  the  ways  in  which  change  presents  itself  and  our  acquired  knowledge  of  meas¬ 
urement  techniques  over  the  past  decade.  Once  again,  we  find  that  human  beings 
are  complex  and  cognitive  beings.  Our  suggestions  are  intended  to  enable  us  to 
appreciate  further  human  change  in  its  complexities. 
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